A novel prognostic signature for lung adenocarcinoma based on cuproptosis-related lncRNAs: A Review

Lung adenocarcinoma (LUAD) is a highly heterogeneous disease with complex pathogenesis, high mortality, and poor prognosis. Cuproptosis is a new type of programmed cell death triggered by copper accumulation that may play an important role in cancer. LncRNAs are becoming valuable prognostic factors in cancer patients. The effect of cuproptosis-related lncRNAs (CRlncRNAs) on LUAD has not been clarified. Based on the Cancer Genome Atlas database, CRlncRNAs were screened by co-expression analysis of cuproptosis- related genes and lncRNAs. Using CRlncRNAs, Cox and LASSO regression analyses constructed a risk prognostic model. The predictive efficacy of the model was assessed and validated using survival analysis, receiver operating characteristic curve, univariate and multifactor Cox regression analysis, and principal component analysis. A nomogram was constructed and calibration curves were applied to enhance the predictive efficacy of the model. Tumor Mutational Burden analysis and chemotherapeutic drug sensitivity prediction were performed to assess the clinical feasibility of the risk model. The novel prognostic signature consisted of 5 potentially high-risk CRlncRNAs, MAP3K20-AS1, CRIM1-DT, AC006213.3, AC008035.1, and NR2F2-AS1, and 5 potentially protective CRlncRNAs, AC090948.1, AL356481.1, AC011477.2, AL031600.2, and AC026355.2, which had accurate and robust predictive power for LUAD patients. Collectively, the novel prognostic signature constructed based on CRlncRNAs can effectively assess and predict the prognosis of patients and provide a new perspective for the diagnosis and treatment of LUAD.


Introduction
Lung cancer (LC) has a high morbidity and mortality rate and poor prognosis, resulting in a heavy disease burden, with an estimated 2 million new cases and 17.6 million deaths per year. [1] Lung adenocarcinoma (LUAD) is a common subtype of LC that develops from small airway epithelial cells and type II alveolar cells and accounts for approximately 40% of all LC. [2,3] Despite many advances in diagnostic and therapeutic strategies for LUAD, it remains 1 of the most aggressive and rapidly lethal tumor types, with a low early diagnosis rate, high late mortality, and poor prognosis. Therefore, there is an urgent need to explore more complete clinical diagnostic methods for LUAD and to discover new and effective biomarkers and therapeutic targets to accurately predict and improve the clinical prognosis of patients with LUAD.
Long non-coding RNAs (lncRNAs) are a series of non-coding transcripts over 200 nucleotides in length that do not encode proteins but play an important role in oncological diseases. [4,5] Related studies have shown that aberrantly HD and JZ contributed equally to this work.
The authors have no conflicts of interest to disclose.
The datasets generated during and/or analyzed during the current study are publicly available.
All data in this study were obtained from public databases and did not involve informed consent was not obtained from the patients involved. Medicine expressed lncRNAs have an important prognostic value in tumor diseases. [6][7][8] Cuproptosis, a form of programmed cell death caused by intracellular copper accumulation that triggers the aggregation of mitochondrial lipidated proteins and the destabilization of Fe-S cluster proteins, may be a promising strategy for tumor treatment. [9,10] At present, little is known about the potential mechanisms of copper death in LUAD, and studies of Cuproptosis-related lncRNAs (CRlncRNAs) in LUAD are yet to be systematically explored. Therefore, the identification of CRlncRNAs closely related to the prognosis of LUAD and the exploration of potential targets related to the mechanism of copper death in LUAD are of great importance for the study of the mechanism of LUAD, as well as the diagnosis, prognosis, and clinical treatment of the disease.
Based on the LUAD dataset from The Cancer Genome Atlas (TCGA), we screened 10 CRlncRNAs to construct a new prognostic model for LUAD, aiming to improve the clinical diagnostic and prognostic accuracy of LUAD and discover potential biomarkers and therapeutic targets, to open new perspectives for individualized precision diagnosis and treatment.

Data sources and processing
LUAD RNA-seq transcriptomic, clinical, and tumor mutational burden (TMB) data were downloaded from TCGA database (https://portal.gdc.cancer.gov/). RNA-seq transcriptome expression data were normalized and missing samples in the clinical data profile were excluded. Eighteen Cuproptosisrelated genes (CRGs) were retrieved from literature, [9,[11][12][13][14][15] including NLRP3, ATP7B, ATP7A, SLC31A1, FDX1, LIAS, LIPT1, LIPT2, DLD, DLAT, PDHA1, PDHB, MTF1, GLS, CDKN2A, DBT, GCSH, and DLST. The LUAD transcriptome data were collated, the Perl (Strawberry Perl 5.30) script was used to distinguish between mRNAs and lncRNAs, and the R language (Rx64 4.1.0) "limma" package was used to extract the expression of CRGs. The expression files of the CRGs and lncRNAs were analyzed using Cor. The test function and Wilcoxon test for co-expression and filtering conditions (cor > 0.4, P < .001) were used to filter out CRlncRNAs. All data in this study were obtained from public databases and did not involve informed consent was not obtained from the patients involved.

Construction and assessment of PRlncRNAs prognostic signature for LUAD
To construct the best LUAD PRlncRNA prediction prognostic model, tumor samples were randomly divided into training and test cohorts at a 1:1 ratio, and the chi-square test was applied to compare the differences between groups. The training cohort data were used for prognostic model construction, and the test cohort and the total cohort were used for model validation. Using R "survival," " survminer" and "glmnet" packages, we first identified potential crlncrnas by univariate Cox regression analysis (P < .01), then reduced overfitting genes by least absolute shrinkage and selection operator (LASSO), and finally constructed the best risk prognosis model through the results of multivariate Cox regression analysis. The formula for calculating the prognostic risk score of the PRlncRNAs for each LUAD sample was as follows: where expr i represents the expression of each lncRNA and coef i represents the corresponding coefficient. According to the median risk scores, LUAD samples were divided into high-and low-risk groups. The expression of PRlncRNAs involved in risk model construction was extracted and correlated with CRGs to explore the potential mutual regulatory relationships between them.
The Kaplan-Meier survival analysis, risk scoring curves, and survival state scatterplot were applied to the training cohort using the R"survival" and " survminer" packages to compare the survival differences between high-and low-risk groups, and the test cohort and the total cohort were applied for validation. Survival analysis was performed on the total cohort by drawing a receiver operating characteristic (ROC) curve and applying the area under the curve (AUC) to evaluate the model's predictive performance. Using univariate and multivariate Cox regression analyses, we assessed whether the risk model can be used as an independent prognostic factor for LUAD. Clinical subgroups were established according to age (<= 65 or > 65 years), sex (male or female), and clinical stage (stage I-II or III-IV). Kaplan-Meier survival analysis was performed to verify whether the model could be used in patients with different clinical characteristics. The R "rms " [16] packages were applied to construct a nomogram to predict survival according to sex, age, disease stage, and risk groups, and Calibration charts were drawn to test the accuracy of the nomogram.

Principal component analysis
Principal component analysis (PCA) [17] was performed to compare the differences between the high-and low-risk groups based on all gene sets, the CRGs set, the CRlncRNA set, and the prognostic model 10-CRlncRNA set.

Tumor mutational burden analysis
TMB is the total number of non-synonymous mutations in each coding region of the tumor genome, and its calculation can indirectly reflect the ability and extent of neoantigen production by tumors. [18,19] Based on the LUAD mutation data downloaded from TCGA database (category: simple nucleotide variation, type: masked somatic mutation, format:maf), TMB was calculated (TMB = Somatic/L, Somatic is the total number of mutations, and L is the size of the effective coding region). The R "maftools" package was applied to analyze the mutation data and construct a Waterfall Plot to visualize the mutations in the high-risk and low-risk groups. To investigate the prognostic value of TMB, LUAD samples were divided into high TMB (H-TMB) and low TMB (L-TMB) groups according to the median TMB score, and survival analysis was performed with a risk score to compare the differences between the groups.

Chemotherapeutic drugs sensitivity prediction
The half-maximal inhibitory concentration (IC50) is the amount of drug required to inhibit a biological process by half, with lower values indicating greater drug sensitivity, and is widely used as a measure of drug efficacy. [20,21] The "pRRophetic" package can predict the IC50 of chemotherapeutic drugs and the sensitivity of chemotherapeutic drugs from the genetic level. [22] Based on the Cancer Genome Project (CGP) cell line expression profiles and TCGA LUAD gene expression profiles, we applied the R "pRRophetic" package to predict The IC50 of sensitive chemotherapeutic drugs was screened by the Pearson correlation coefficient, and the IC50 of sensitive chemotherapeutic drugs was analyzed by Wilcoxon rank-sum test to determine the difference of IC50 between high and lowrisk groups.

Construction and assessment of PRlncRNAs prognostic signature for LUAD
A flowchart for constructing and assessing the prognostic signature of PRlncRNAs in LUAD is shown in Figure 1. Based on the CRGs and CRlncRNA expression files, co-expression analysis (cor > 0.4, P < .001) was performed, and 1611 significantly correlated CRlncRNAs were obtained. Samples with missing expression and clinical data were removed, and the included tumor samples (n = 497) were randomly divided into a training cohort (n = 249) and a test cohort (n = 249) in a ratio of 1:1. There were no significant differences (P < .05) in age, sex, tumor stage, or other clinical characteristics between the groups ( Table 1). The expression files of the training cohort of CRlncRNAs were subjected to univariate Cox regression analysis (P < .01), and 20 candidate CRlncRNAs were found to be significantly associated with survival ( Fig. 2A). Overfitting genes were reduced by LASSO constraint parameters ( . ACCORDING TO THE MEDIAN RISK SCORES, the LUAD samples were divided into a high-risk group and a low-risk group. The CRGs were correlated with the 10 CRlncRNAs involved in model construction by line analysis (Fig. 2D), and the results showed that FDX1 had a highly significant positive correlation with CRIM1-DT (P < .001) and PDHB had a highly significant negative correlation (P < .001).
Kaplan-Meier analysis showed that overall survival (OS) was better in the low-risk group than in the high-risk group, suggesting that the risk score could predict OS (Fig. 3A). Risk scores and survival status were visualized using risk-scoring curves and state scatterplots. The results showed that higher risk scores resulted in higher mortality, indicating that risk scores can predict mortality ( Fig. 3D and G). The expression heatmap of 10 CRlncRNAs in the model in the high-and lowrisk groups (Fig. 3J) showed that AC090948.1, AL356481.1, AC011477.2, AL031600.2, and AC026355.2 were highly expressed in the low-risk group, and AP3K20-AS1, CRIM1-DT, AC006213.3, AC008035.1, and NR2F2-AS1 were highly expressed in the high-risk group. To further test the predictive performance of the model, the test cohort (Fig. 3B, E, H, and K) and the total cohort (Fig. 3C, F, I, and L) were used for validation, and the results were the same as those of the training group.
The receiver operating characteristic (ROC) curves can evaluate the accuracy of a diagnostic test by the AUC. [23] Clinical multi-index ROC curves showed an AUC value of 0.738 for the risk score, higher than other clinical factors (Fig. 4A). Timedependent ROC curves predicted OS at 1, 3, and 5 years with AUC values of 0.738, 0.630, and 0.687 respectively (Fig. 4B). This indicates that the model has better predictive power for risk scores than for other clinical factors, and has predictive power for survival time. In the univariate Cox regression analysis, the hazard ratio of the risk score was 1.042 (95% CI 1.028-1.055) (P < .001) (Fig. 3C), and in the multivariate Cox regression analysis, it was 1.040 (95% CI 1.026-1.053) (P < .001) (Fig. 4D), indicating that the constructed risk prognostic model for CRlncRNAs could be used as an independent prognostic factor for LUAD.
By applying age, sex, stage, and risk score factors, a nomogram was developed to predict the survival rate of patients with LUAD at 1, 3, and 5 years. If the patient was male (or female), aged 65 years, low-risk group, stage, the total number of points was calculated as 123 according to the corresponding points in the nomogram, and the corresponding 5-year survival rate was 0.724, the 3-year survival rate was 0.868, and the 1-year survival rate was 0.968 (Fig. 4E). The calibration charts for 1, 3, and 5 years were all close to the solid gray line, indicating that the nomogram had good predictive ability (Fig. 4F). The results of the clinical subgroup survival analysis of LUAD showed that the survival of high-risk patients was lower than that of low-risk patients by age (≤ 65 years, >65 years), sex (female, male), and clinical stage (stages I-II and III-IV) (Fig. 4G-L), indicating that this prognostic model applies to patients with different clinical characteristics.

Principal components analysis
PCA was performed on LUAD samples based on all gene sets, the CRGs set, the CRlncRNA set, and the prognostic model 10-CRlncRNA set. The results showed (Fig. 5) that in the prognostic model 10-CRlncRNA set, the high-risk and low-risk groups were distributed in different distinct directions, suggesting that the CRlncRNA risk model could divide patients with LUAD into 2 groups: high-risk and low-risk, suggesting that the cuproptosis status of the 2 groups was different.

Tumor mutation burden analysis
The TMB of LUAD samples in TCGA database was calculated to analyze the differences in gene mutations between the highand low-risk groups. The mutation rate in the high-risk group was 90.31%, and the top 5 genes with high mutation frequencies were TTN (44%), TP53 (42%), CSMD3 (40%), MUC16 (38%), and RYR2 (36%) (Fig. 6A). The mutation rate in the low-risk group was 90.35% higher than that in the high-risk group, and the top 5 genes with the highest mutation frequencies were TP53 (50%), TTN (44%), MUC16 (43%), CSMD3 (36%), and RYR2 (36%) (Fig. 6B). Missense mutations were the main mutations in both the LUAD samples. The main form of mutation in both groups of LUAD samples was missense. To investigate the prognostic value of TMB, the samples were divided into high-and low-TMB groups according to Table 1 The sociodemographic information of patients.

Characteristics
Total ( the median TMB score, and survival analysis was performed. The results showed that the median survival time (MST) of the high TMB group was better than that of the low TMB group (P < .05) (Fig. 6C). Combined survival analysis of TMB and risk score showed a highly statistically significant difference in MST among the 4 groups (P < .001), with the best prognosis in the high TMB + low-risk group and the worst prognosis in the low TMB + high-risk group (Fig. 6D).

Chemotherapeutic drugs sensitivity prediction
Chemotherapy is an important treatment option in patients with LUAD. Sensitive chemotherapeutic drugs were screened by predicting their IC50 of chemotherapeutic drugs to further investigate the correlation between the risk and prognosis models and chemotherapeutic drug sensitivity. The results of the study showed that the IC50 values of CP724714, FH535, gefitinib, MP470, NSC-207895, PD-0325901, rTRAIL, and TAK-715, 8 sensitive chemotherapeutic agents, were positively correlated with the risk score (Fig. 7A-H) and were lower in the low-risk group than in the high-risk group (P < .001) (Fig. 8A-H), indicating that they could be candidates for LUAD patients in the low-risk group. The IC50 values of (5Z)-7-Oxozeaenol, A-770041, AP-24534, BEZ235, CGP-60474, cytarabine, dasatinib, pazopanib, saracatinib, THZ-2-49, and WH-4-023, 11 chemotherapeutic agents, were negatively correlated with the risk scores were negatively correlated ( Fig. 7I-S) and were lower in the high-risk group than in the low-risk group (P < .001) (Fig. 8I-S), indicating that they could be used as candidates for LUAD patients in the high-risk group. Therefore, the model risk profile could be used as a potential indicator for predicting drug sensitivity.

Discussion
LUAD is a highly heterogeneous tumor with complex mechanisms, high morbidity and mortality rates, and a poor prognosis. Robust prognosis prediction models can help accurately assess and predict the prognosis of the disease and formulate accurate individualized treatment strategies. Copper (Cu) is involved in important biological processes related to cancer, including mitochondrial respiration, immune system regulation, antioxidant defense, collagen cross-linking, autophagy, and mitogenic signaling, and plays an important role in cancer. [24][25][26] Cuproptosis is programmed cell death caused by intracellular copper accumulation that triggers mitochondria. Copper ion carriers have been used as anticancer agents to promote copper death, [27] suggesting that intervention in cuproptosis may be a new target for cancer therapy. LncRNAs play a central role in maintaining various biological activities in tumors and can be used as potential prognostic biomarkers to provide new options for clinical treatment. [28][29][30] Relevant literatures show that lncRNAs are closely related to the occurrence and development of LUAD, [31][32][33] and are of great significance for guiding the clinical prognosis of LUAD, [34][35][36] while the correlation between copper death-related lncRNAs and LUAD needs to be further systematically explored.
The model was constructed using a training cohort for OS and mortality prediction, and the results showed poorer OS and higher mortality in the high-risk group than in the low-risk group and were validated in the test cohort and total cohort. The predictive ability of the model for patients with LUAD was verified using clinical multi-index and time-dependent ROC curves. The model was validated as an independent prognostic factor for LUAD using univariate and multinomial Cox regression analysis. A nomogram was constructed using differences in age, sex, staging, and risk scores, and testing was performed by applying calibration charts to enhance the predictive power of the LUAD prognosis. Kaplan-Meier survival analysis by clinical subgroup illustrated that the model is equally applicable to the prediction of survival in patients with different clinical characteristics. In PCA analysis based on the prognostic model of CRlncRNAs, samples were clearly distinguished into high-and low-risk groups, suggesting that cuproptosis status may differ between the 2 groups. This shows that the LUAD prediction model constructed from the above 10 CRlncRNAs has accurate and robust prediction ability.
To better assess the clinical feasibility of the risk model, TMB analysis and chemotherapeutic drug sensitivity prediction were performed. TMB is a key predictor of the clinical benefit of immunotherapy and can predict the survival prognosis of cancer immunotherapy, and related studies have shown that high TMB has a better survival prognosis. [47,48] TP53 mutation is 1 of the common mutations in early LUAD, [49] and both missense and nonsense mutation of TP53 are associated with TMB and elevated neoantigen levels. [50] The TMB results of this study showed that the TP53 mutation rate was in the first place, dominated by missense mutation and nonsense mutation, and the TP53 mutation rate was 50% in the lowrisk group, which was higher than 42% in the high-risk group. Survival analysis of samples from the high and low TMB groups showed that MST was better in the high TMB group than in the low TMB group (P < .05) (Fig. 7C). Combining TMB with a risk score for survival analysis showed that the high TMB + low-risk group had the best prognosis (P < .001) and the low TMB + high-risk group had the worst prognosis (P < .001) (Fig. 7D). Thus, this model, combined with TMB, can accurately predict LUAD prognosis and be used as a novel predictive clinical biomarker. In addition, through the correlation analysis between the IC50 of chemotherapeutic drugs and risk score, and the analysis of the difference in sensitivity of chemotherapeutic drugs between high-and low-risk groups, the model can be used as a potential indicator to predict the sensitivity of chemotherapeutic drugs and provide better treatment strategies for patients with LUAD.
However, this study has some limitations. First, this was a retrospective analysis based on publicly available data from the TCGA. Therefore, further prospective analyses are needed to validate the clinical application of CRlncRNA prediction models in patients with LUAD. Second, studies on the mechanism of cuproptosis are still in their initial stages. The number of CRlncRNAs retrieved from the literature is relatively limited, and no studies have yet elucidated the mechanism of lncRNA regulation of cuproptosis in LUAD. This is an exploratory study around the frontiers of science, starting with CRlncRNAs, to open new perspectives for the elucidation of LUAD mechanisms, discovery of potential targets, and clinical treatment.

Acknowledgments
This study was funded by the Taishan Scholars Construction Project (no. 201712096).

Author contributions
Conceptualization: Huang